Fix spark application spans status on sql analysis failure#10981
Conversation
Add lastSqlFailed tracking to AbstractDatadogSparkListener when SQL calls (e.g. SparkSession.sql()) throw exceptions during Catalyst analysis, before any Spark job is submitted. This ensures finishApplication() can mark the application span as ERROR even when no job/stage/task events fire. The error priority in finishApplication() is: throwable (from caller) > exitCode != 0 > lastJobFailed > lastSqlFailed Add unit tests to verify SQL failures mark application spans as ERROR, and that job failures take precedence over SQL failures. Fixes: Spark application traces marked SUCCESS when SQL analysis fails
Add SparkSqlFailureAdvice that intercepts SparkSession.sql() method calls and propagates any exceptions (e.g. AnalysisException) to the listener via the new onSqlFailure() callback. This ensures SQL analysis failures that occur before any Spark job is submitted are captured and can be reported as ERROR in the application span.
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 60 metrics, 11 unstable metrics. Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.61.0-SNAPSHOT~f497183167, baseline=1.61.0-SNAPSHOT~b7fb6fc123
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.062 s) : 0, 1062105
Total [baseline] (8.845 s) : 0, 8844898
Agent [candidate] (1.062 s) : 0, 1061910
Total [candidate] (8.861 s) : 0, 8860530
section iast
Agent [baseline] (1.231 s) : 0, 1230911
Total [baseline] (9.564 s) : 0, 9564099
Agent [candidate] (1.234 s) : 0, 1234331
Total [candidate] (9.55 s) : 0, 9550444
gantt
title insecure-bank - break down per module: candidate=1.61.0-SNAPSHOT~f497183167, baseline=1.61.0-SNAPSHOT~b7fb6fc123
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.219 ms) : 0, 1219
crashtracking [candidate] (1.199 ms) : 0, 1199
BytebuddyAgent [baseline] (632.211 ms) : 0, 632211
BytebuddyAgent [candidate] (632.168 ms) : 0, 632168
AgentMeter [baseline] (29.447 ms) : 0, 29447
AgentMeter [candidate] (29.739 ms) : 0, 29739
GlobalTracer [baseline] (258.267 ms) : 0, 258267
GlobalTracer [candidate] (257.982 ms) : 0, 257982
AppSec [baseline] (32.002 ms) : 0, 32002
AppSec [candidate] (31.799 ms) : 0, 31799
Debugger [baseline] (59.9 ms) : 0, 59900
Debugger [candidate] (59.888 ms) : 0, 59888
Remote Config [baseline] (590.51 µs) : 0, 591
Remote Config [candidate] (610.303 µs) : 0, 610
Telemetry [baseline] (8.121 ms) : 0, 8121
Telemetry [candidate] (8.051 ms) : 0, 8051
Flare Poller [baseline] (4.256 ms) : 0, 4256
Flare Poller [candidate] (4.317 ms) : 0, 4317
section iast
crashtracking [baseline] (1.189 ms) : 0, 1189
crashtracking [candidate] (1.204 ms) : 0, 1204
BytebuddyAgent [baseline] (799.556 ms) : 0, 799556
BytebuddyAgent [candidate] (802.217 ms) : 0, 802217
AgentMeter [baseline] (11.416 ms) : 0, 11416
AgentMeter [candidate] (11.439 ms) : 0, 11439
GlobalTracer [baseline] (247.67 ms) : 0, 247670
GlobalTracer [candidate] (248.82 ms) : 0, 248820
AppSec [baseline] (26.553 ms) : 0, 26553
AppSec [candidate] (27.318 ms) : 0, 27318
Debugger [baseline] (67.677 ms) : 0, 67677
Debugger [candidate] (67.929 ms) : 0, 67929
Remote Config [baseline] (542.924 µs) : 0, 543
Remote Config [candidate] (522.909 µs) : 0, 523
Telemetry [baseline] (10.768 ms) : 0, 10768
Telemetry [candidate] (9.668 ms) : 0, 9668
Flare Poller [baseline] (3.855 ms) : 0, 3855
Flare Poller [candidate] (3.562 ms) : 0, 3562
IAST [baseline] (25.445 ms) : 0, 25445
IAST [candidate] (25.368 ms) : 0, 25368
Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.61.0-SNAPSHOT~f497183167, baseline=1.61.0-SNAPSHOT~b7fb6fc123
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.062 s) : 0, 1061963
Total [baseline] (11.061 s) : 0, 11061274
Agent [candidate] (1.06 s) : 0, 1059782
Total [candidate] (11.085 s) : 0, 11084641
section appsec
Agent [baseline] (1.251 s) : 0, 1250673
Total [baseline] (11.132 s) : 0, 11131991
Agent [candidate] (1.252 s) : 0, 1252443
Total [candidate] (11.136 s) : 0, 11136183
section iast
Agent [baseline] (1.244 s) : 0, 1244330
Total [baseline] (11.357 s) : 0, 11356745
Agent [candidate] (1.234 s) : 0, 1233896
Total [candidate] (11.304 s) : 0, 11304057
section profiling
Agent [baseline] (1.187 s) : 0, 1186629
Total [baseline] (11.049 s) : 0, 11048588
Agent [candidate] (1.185 s) : 0, 1185430
Total [candidate] (11.017 s) : 0, 11017459
gantt
title petclinic - break down per module: candidate=1.61.0-SNAPSHOT~f497183167, baseline=1.61.0-SNAPSHOT~b7fb6fc123
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.191 ms) : 0, 1191
crashtracking [candidate] (1.198 ms) : 0, 1198
BytebuddyAgent [baseline] (631.989 ms) : 0, 631989
BytebuddyAgent [candidate] (630.939 ms) : 0, 630939
AgentMeter [baseline] (29.451 ms) : 0, 29451
AgentMeter [candidate] (29.607 ms) : 0, 29607
GlobalTracer [baseline] (257.879 ms) : 0, 257879
GlobalTracer [candidate] (257.504 ms) : 0, 257504
AppSec [baseline] (31.982 ms) : 0, 31982
AppSec [candidate] (31.798 ms) : 0, 31798
Debugger [baseline] (60.454 ms) : 0, 60454
Debugger [candidate] (60.405 ms) : 0, 60405
Remote Config [baseline] (622.016 µs) : 0, 622
Remote Config [candidate] (619.823 µs) : 0, 620
Telemetry [baseline] (8.826 ms) : 0, 8826
Telemetry [candidate] (8.103 ms) : 0, 8103
Flare Poller [baseline] (3.501 ms) : 0, 3501
Flare Poller [candidate] (3.51 ms) : 0, 3510
section appsec
crashtracking [baseline] (1.196 ms) : 0, 1196
crashtracking [candidate] (1.197 ms) : 0, 1197
BytebuddyAgent [baseline] (661.436 ms) : 0, 661436
BytebuddyAgent [candidate] (661.655 ms) : 0, 661655
AgentMeter [baseline] (12.08 ms) : 0, 12080
AgentMeter [candidate] (12.24 ms) : 0, 12240
GlobalTracer [baseline] (258.588 ms) : 0, 258588
GlobalTracer [candidate] (259.156 ms) : 0, 259156
AppSec [baseline] (177.953 ms) : 0, 177953
AppSec [candidate] (178.45 ms) : 0, 178450
Debugger [baseline] (66.487 ms) : 0, 66487
Debugger [candidate] (66.494 ms) : 0, 66494
Remote Config [baseline] (633.756 µs) : 0, 634
Remote Config [candidate] (656.355 µs) : 0, 656
Telemetry [baseline] (8.352 ms) : 0, 8352
Telemetry [candidate] (8.412 ms) : 0, 8412
Flare Poller [baseline] (3.554 ms) : 0, 3554
Flare Poller [candidate] (3.554 ms) : 0, 3554
IAST [baseline] (24.163 ms) : 0, 24163
IAST [candidate] (24.27 ms) : 0, 24270
section iast
crashtracking [baseline] (1.227 ms) : 0, 1227
crashtracking [candidate] (1.188 ms) : 0, 1188
BytebuddyAgent [baseline] (808.196 ms) : 0, 808196
BytebuddyAgent [candidate] (800.268 ms) : 0, 800268
AgentMeter [baseline] (11.729 ms) : 0, 11729
AgentMeter [candidate] (11.459 ms) : 0, 11459
GlobalTracer [baseline] (249.997 ms) : 0, 249997
GlobalTracer [candidate] (248.756 ms) : 0, 248756
AppSec [baseline] (26.973 ms) : 0, 26973
AppSec [candidate] (26.728 ms) : 0, 26728
Debugger [baseline] (69.662 ms) : 0, 69662
Debugger [candidate] (69.995 ms) : 0, 69995
Remote Config [baseline] (525.433 µs) : 0, 525
Remote Config [candidate] (527.244 µs) : 0, 527
Telemetry [baseline] (10.193 ms) : 0, 10193
Telemetry [candidate] (9.724 ms) : 0, 9724
Flare Poller [baseline] (3.782 ms) : 0, 3782
Flare Poller [candidate] (3.57 ms) : 0, 3570
IAST [baseline] (25.714 ms) : 0, 25714
IAST [candidate] (25.437 ms) : 0, 25437
section profiling
crashtracking [baseline] (1.19 ms) : 0, 1190
crashtracking [candidate] (1.169 ms) : 0, 1169
BytebuddyAgent [baseline] (684.674 ms) : 0, 684674
BytebuddyAgent [candidate] (684.621 ms) : 0, 684621
AgentMeter [baseline] (8.972 ms) : 0, 8972
AgentMeter [candidate] (8.97 ms) : 0, 8970
GlobalTracer [baseline] (215.452 ms) : 0, 215452
GlobalTracer [candidate] (215.257 ms) : 0, 215257
AppSec [baseline] (32.497 ms) : 0, 32497
AppSec [candidate] (32.337 ms) : 0, 32337
Debugger [baseline] (66.032 ms) : 0, 66032
Debugger [candidate] (65.283 ms) : 0, 65283
Remote Config [baseline] (575.223 µs) : 0, 575
Remote Config [candidate] (568.447 µs) : 0, 568
Telemetry [baseline] (7.788 ms) : 0, 7788
Telemetry [candidate] (7.748 ms) : 0, 7748
Flare Poller [baseline] (3.539 ms) : 0, 3539
Flare Poller [candidate] (4.282 ms) : 0, 4282
ProfilingAgent [baseline] (94.746 ms) : 0, 94746
ProfilingAgent [candidate] (94.212 ms) : 0, 94212
Profiling [baseline] (95.31 ms) : 0, 95310
Profiling [candidate] (94.76 ms) : 0, 94760
LoadParameters
See matching parameters
SummaryFound 0 performance improvements and 2 performance regressions! Performance is the same for 16 metrics, 18 unstable metrics.
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.61.0-SNAPSHOT~f497183167, baseline=1.61.0-SNAPSHOT~b7fb6fc123
dateFormat X
axisFormat %s
section baseline
no_agent (18.181 ms) : 17997, 18365
. : milestone, 18181,
appsec (18.78 ms) : 18594, 18965
. : milestone, 18780,
code_origins (17.996 ms) : 17819, 18174
. : milestone, 17996,
iast (18.14 ms) : 17957, 18322
. : milestone, 18140,
profiling (18.692 ms) : 18507, 18878
. : milestone, 18692,
tracing (17.759 ms) : 17584, 17933
. : milestone, 17759,
section candidate
no_agent (18.226 ms) : 18038, 18414
. : milestone, 18226,
appsec (18.657 ms) : 18470, 18843
. : milestone, 18657,
code_origins (17.8 ms) : 17623, 17978
. : milestone, 17800,
iast (17.735 ms) : 17560, 17910
. : milestone, 17735,
profiling (20.037 ms) : 19831, 20242
. : milestone, 20037,
tracing (17.794 ms) : 17618, 17969
. : milestone, 17794,
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.61.0-SNAPSHOT~f497183167, baseline=1.61.0-SNAPSHOT~b7fb6fc123
dateFormat X
axisFormat %s
section baseline
no_agent (1.243 ms) : 1230, 1255
. : milestone, 1243,
iast (3.175 ms) : 3132, 3219
. : milestone, 3175,
iast_FULL (6.107 ms) : 6044, 6170
. : milestone, 6107,
iast_GLOBAL (3.765 ms) : 3702, 3828
. : milestone, 3765,
profiling (2.317 ms) : 2294, 2339
. : milestone, 2317,
tracing (2.014 ms) : 1995, 2033
. : milestone, 2014,
section candidate
no_agent (1.236 ms) : 1225, 1248
. : milestone, 1236,
iast (3.174 ms) : 3133, 3215
. : milestone, 3174,
iast_FULL (6.032 ms) : 5971, 6093
. : milestone, 6032,
iast_GLOBAL (3.699 ms) : 3637, 3761
. : milestone, 3699,
profiling (2.298 ms) : 2276, 2320
. : milestone, 2298,
tracing (1.896 ms) : 1879, 1912
. : milestone, 1896,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics. Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.61.0-SNAPSHOT~f497183167, baseline=1.61.0-SNAPSHOT~b7fb6fc123
dateFormat X
axisFormat %s
section baseline
no_agent (15.063 s) : 15063000, 15063000
. : milestone, 15063000,
appsec (15.044 s) : 15044000, 15044000
. : milestone, 15044000,
iast (17.828 s) : 17828000, 17828000
. : milestone, 17828000,
iast_GLOBAL (17.857 s) : 17857000, 17857000
. : milestone, 17857000,
profiling (15.515 s) : 15515000, 15515000
. : milestone, 15515000,
tracing (14.783 s) : 14783000, 14783000
. : milestone, 14783000,
section candidate
no_agent (15.235 s) : 15235000, 15235000
. : milestone, 15235000,
appsec (15.23 s) : 15230000, 15230000
. : milestone, 15230000,
iast (18.59 s) : 18590000, 18590000
. : milestone, 18590000,
iast_GLOBAL (17.838 s) : 17838000, 17838000
. : milestone, 17838000,
profiling (15.011 s) : 15011000, 15011000
. : milestone, 15011000,
tracing (14.883 s) : 14883000, 14883000
. : milestone, 14883000,
Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.61.0-SNAPSHOT~f497183167, baseline=1.61.0-SNAPSHOT~b7fb6fc123
dateFormat X
axisFormat %s
section baseline
no_agent (1.482 ms) : 1470, 1493
. : milestone, 1482,
appsec (3.758 ms) : 3542, 3974
. : milestone, 3758,
iast (2.261 ms) : 2193, 2330
. : milestone, 2261,
iast_GLOBAL (2.313 ms) : 2244, 2382
. : milestone, 2313,
profiling (2.1 ms) : 2044, 2155
. : milestone, 2100,
tracing (2.082 ms) : 2028, 2135
. : milestone, 2082,
section candidate
no_agent (1.479 ms) : 1468, 1491
. : milestone, 1479,
appsec (3.744 ms) : 3526, 3961
. : milestone, 3744,
iast (2.266 ms) : 2197, 2335
. : milestone, 2266,
iast_GLOBAL (2.308 ms) : 2239, 2377
. : milestone, 2308,
profiling (2.112 ms) : 2056, 2168
. : milestone, 2112,
tracing (2.076 ms) : 2023, 2129
. : milestone, 2076,
|
|
Hi! 👋 Thanks for your pull request! 🎉 To help us review it, please make sure to:
If you need help, please check our contributing guidelines. |
pawel-big-lebowski
left a comment
There was a problem hiding this comment.
The code looks good to me. Would it be possible to include End to end test which:
- creates
sparkSession = SparkSession.builder().. - runs
sparkSession.sql("""...)with table not found for example - and checks if the trace contains expected error attributes.
Similar tests exist in AbstractSpark32SqlTest.
|
/merge |
|
View all feedbacks in Devflow UI.
The expected merge time in
|
What Does This Do
Context: SparkSession.sql() calls that throw during Catalyst analysis (e.g. AnalysisException for missing tables) fire before any Spark job/stage events: our current instrumentation never sees them, so the sparl.application span stays green.
Motivation
Make sure that Spark application span are marked as error when the call spark.sql().show() fails on SQL analysis failure
Additional Notes
Contributor Checklist
type:and (comp:orinst:) labels in addition to any other useful labelsclose,fix, or any linking keywords when referencing an issueUse
solvesinstead, and assign the PR milestone to the issueJira ticket: [PROJ-IDENT]
Note: Once your PR is ready to merge, add it to the merge queue by commenting
/merge./merge -ccancels the queue request./merge -f --reason "reason"skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.